Detection of candidate genes affecting milk production traits in sheep using whole‐genome sequencing analysis

Abstract Background Artificial and natural selection for important economic traits and genetic adaptation of the populations to specific environments have led to the changes on the sheep genome. Recent advances in genome sequencing methods have made it possible to use comparative genomics tools to identify genes under selection for traits of economic interest in domestic animals. Objectives In this study, we compared the genomes of Assaf and Awassi sheep breeds with those of the Cambridge, Romanov and British du cher sheep breeds to explore positive selection signatures for milk traits using nucleotide diversity (Pi) and FST statistical methods. Methods Genome sequences from fourteen sheep with a mean sequence depth of 9.32X per sample were analysed, and a total of 23 million single nucleotide polymorphisms (SNPs) were called and applied for this study. Genomic clustering of breeds was identified using ADMIXTURE software. The FST and Pi values for each SNP were computed between population A (Assaf and Awassi) and population B (Cambridge, British du cher, and Romanov). Results The results of the PCA grouped two classes for these five dairy sheep breeds. The selection signatures analysis displayed 735 and 515 genes from FST and nucleotide diversity (Pi) statistical methods, respectively. Among all these, 12 genes were shared between the two approaches. The most conspicuous genes were related to milk traits, including ST3GAL1 (the synthesis of oligosacáridos), CSN1S1 (milk protein), CSN2 (milk protein), OSBPL8 (fatty acid traits), SLC35A3 (milk fat and protein percentage), VPS13B (total milk production, fat yield, and protein yield), DPY19L1 (peak yield), CCDC152 (lactation persistency and somatic cell count), NT5DC1 (lactation persistency), P4HTM (test day protein), CYTH4 (FAT Production) and METRNL (somatic cell), U1 (milk traits), U6 (milk traits) and 5S_RRNA (milk traits). Conclusions The findings provide new insight into the genetic basis of sheep milk properties and can play a role in designing sheep breeding programs incorporating genomic information.


INTRODUCTION
The sheep was one of the first herbivores to be partially domesticated due to its controllable size and ability to adapt to different climates and diets with poor nutrition. After artificial selection, different breeds were formed with different morphology, coat colour or main production (milk or wool and meat). For the last few decades, reproductive breeding methods using calculated breeding values of reproductive traits from pedigree and phenotype information have significantly increased sheep performance (Goddard & Hayes 2009;Niemann & Kues 2003).
In recent years, advances in high-throughput genome scanning methods, chiefly whole-genome sequencing, SNP genotyping array, and comparative genomic hybridisation (CGH) arrays, have allowed identifying causal genes and variants related to different traits. Compared to the other two methods, the next-generation sequencing method has been proven to be a valuable strategy for detecting functional genes and genetic variants related to important economic traits in sheep (Jiang et al., 2014).
Sheep milk is a notable source of income, computing for a large part of universal milk production, and is generally applied to yield many cul- tured dairy supplies, like cheese. Milk traits can be defined by calculating yield, fat, lactose percentage and protein. Because local multipurpose breeds mainly produce sheep's milk with low to moderate milk outputs, few sheep breeds have been expanded obviously to produce milk. To date, only a few studies including GWAS and selective sweep experiments have been performed to identify genomic regions affecting milk traits in sheep using the SNP array (Eydivandi et al., 2021;Garcia-Gamez et al., 2012;Gutiérrez-Gil et al., 2014;Moioli et al., 2013;Rupp et al., 2015;Yuan et al., 2019) and RNA-Sequencing methods (Giordani et al., 2017;Suarez-vega et al., 2015;Suarez-vega et al., 2016).
A positive selection signature analysis has previously been used to detect the genes/gene regions influenced by selection using genome comparison of different phenotypic breeds. For this reason, the sequencing of the reference genome assemblies of the sheep (Ovisaries) simplified efforts to realise the genetic basis of phenotypic differences of several sheep breeds with re-sequencing methods.
Various methods have been expanded and applied to distinguish the positive selection signatures in the genomes of different livestock species or breeds. Some are based on significantly reduced nucleotide diversity levels, long-range haplotype homozygosity (Sabeti et al., 2007) and high population differentiation (Bahbahani et al., 2015;Porto-Neto et al., 2014). Information on the genetic diversity of these genes in different sheep populations can be interesting. For example, evidence of selection signatures around these genes could indicate a potential direct effect of the desired gene on phenotypes of interest for the dairy sheep industry. In this research, nucleotide diversity (Pi) and FST statistical methods were applied to compare the genomes of Assaf and Awassi breeds with Cambridge, Romanov and British du Cher breeds in order to explore positive selection signatures for milk traits and to find new candidate genes in the genomic regions under selection.

Sequencing data and mapping
We utilised 14 whole genomes, including an Assaf breed (acces-

Studied population
The Assaf breed is the result of crossbreeding the Awassi and East Friesian from Israel and is raised for milk and meat. The Awassi originated from the Syro-Arabian desert in South-West Asia and primarily is raised for milk.

SNP detection
To increase and improve the quality score of each base, base quality scores were re-evaluated using the GATK3.4-46-gbc02625 program (McKenna et al., 2010). SNPs of all individuals were identified and filtered using GATK Unified Genotyper and GATK Variant Filtration, respectively.

Analyses of the regions of homozygosity
Runs of homozygosity (ROH) segments are introduced as contiguous homozygous regions of the genome, which can provide an insight into a population's history, genomic inbreeding, and signatures of selection (Ceballos et al., 2018;Purfield et al., 2012). The extent and length of these ROHs are different among individuals and populations (Curik et al., 2014;Purfield et al., 2012). We obtained ROH segments for each breed using the 'runs of homozygosity' function (-homozyg-kb) for three length categories (more than 100, 200 and 300 kb) in the program PLINK v1.07 (Purcell et al., 2007). It should be noted that we used default parameters for other adjustments.

Genetic structure analysis
Genomic clustering of breeds was identified using ADMIXTURE software (Alexander & Lange, 2011) with a maximum likelihood approach.
The admixture runs were done for K values of 1-4. Barplots were produced for K values in the dataset. In addition, the different kinds of files (bim, bed and fam files) were created in PLINK software to draw possible population/genetic structure between and within different populations under study. Then these files were handled in an Rprogramming environment (http://www.R-project.org) for eigenvector formation and PCA plotting along two axes. The first two principal components (PCs) (i.e., PC 1 and PC 2) were concurrently calculated for each dataset. In the end, these PCs were graphically plotted against each other in two axes. Also, Nei's genetic distances (Nei, 1972) among different populations were applied to build a neighbour-joining (NJ) tree ingdsfmt and NPRelate (Zheng et al., 2012;Zheng et al., 2017) packages of R.

Selective sweep region annotation and gene functional enrichment analysis
Candidate selective sweeps discovered with the approaches mentioned above were annotated using the Variant Effect Predictor existing at http://asia.ensembl.org/info/docs/tools/index.html.
Functional enrichment analysis was performed using the 'g:Profiler' enrichment analysis tool to identify their biological functions (Reimand et al., 2011). And the p value of the gene enrichment was adjusted by Benjamini-Hochberg FDR (false-discovery rate).

Output mapping of five sheep breeds
Our efforts produced a mean sequence depth of 9.32× per sample, within a range of 3.82-to 12.7-fold (Table 1). A total of 23 million SNPs were detected in all individuals.

Runs of homozygosity
For all breeds, shorter ROHs (100 kb) were more frequent than longer ROHs (

Positive selection signature
Using nucleotide diversity (Pi) and FST statistical methods, we compared the genomes of Assaf and Awassi breeds with Cambridge,

F I G U R E 3 Admixture output for four value of k exanimated
Romanov and British du cher breeds to explore positive selection signatures. The first percentile rank was applied as a threshold to detect candidates all over the analysis. Several protein-coding genes with significant higher FST (735 genes, Table S1) and a lower value for nucleotide diversity (Pi, 515 genes, Table S2) were detected and examined as potential candidate genes that endured selection during domestication. Among all these, 12 genes were shared between two of the methods ( Figure S1). Gene ontology (GO) analysis for the genes showing signatures of positive selection is reported in the table (Table   S3). All the terms related to sensory perception are over-represented (p < 0.01). Potential candidate genes for milk traits are listed in the

DISCUSSION
The Cambridge breed appeared as an independent cluster for K values of 2-. At K = 3, Assaf, Awassi and British du cher breeds were clustered the same, while 75% of the Romanov population were in the separate group. At K = 4, the Assaf and Awassi breeds had the same cluster. In contrast, 50% of the Romanov population had the same cluster with British du cher and another 50% of the Romanov population clustered in a separate group. Analysis of genetic structure confirmed a close genetic relationship between two dairy sheep breeds, Assaf and Awassi. These results were in agreement with the fact that the Assaf sheep were created by crossing the East Friesian sheep to Awassi, which both have high milk and high lamb production capacity (Gootwine & Goot, 1996). The Cambridge breed had an independent cluster for K values of 2-4. It is reasonable because the Cambridge breed is a crossbred sheep from mating between Landrace rams with many local ewes in Cambridge's university. Results of admixture analysis at K = 3 (with the least CV error), similar to a previous study (Meadows et al., 2005), indicated that sheep have the weakest population structure and the highest intercontinental dispersal compared to other domestic animals studied to date. The frequency of ROHs across the genome can present information about the history and the management of flocks over time (Kirin et al., 2010). The mean number of RoHs above 100, 200 and 300 Kb per population is reported in Figure 5.
We observed a decreasing trend from short to long lengths. Longer ROH may indicate recent inbreeding has occurred within a pedigree, Two genes identified simultaneously by the applied approaches, U1 and U6 genes, encode small nuclear RNAs (snRNAs) that play a role in RNA splicing. The 5S ribosomal RNA gene encodes 5SrRNA, a structural and functional component of the large subunit of the ribosome.
It has been previously reported to be positively selected for milk traits (Taye et al., 2017).
The population differentiation analysis (FST) revealed GO terms related to different biological processes. For example, the term 'regulation of homeostatic process' was enriched to which genes for milk traits (P4HTM and METRNL) were involved. P4HTM and METRNL genes were formerly presented associated with the test day protein (Ibeagha-Awemu et al., 2016) and somatic cell (Chen et al., 2015) in milk, respectively. Other genes associated with milk traits, CSN1S1 and CSN2, were found to be selection candidates (top 1% Pi). Previous studies reported that polymorphisms in these milk proteins impact milk production parameters and protein quality (Caravaca et al., 2008;Kishore et al., 2013). Casein alpha S1 protein is encoded by the CSN1S1 gene and has an important role in the milk capacity for calcium phosphate transport. CSN2 gene encodes beta-casein, the main protein in milk and is the primary source of essential amino acids for infants. ST3GAL1 gene has been previously reported to be responsible for the production of 3′-sialylactose in colostrum and late lactation milk in goats (Crisà et al., 2016).
A polymorphism in the SLC35A3 gene has been found to be associated with fat yield and protein production (Liu et al., 2018). It has been reported that the VPS13B gene influences total milk production, fat yield and protein production (Liu et al., 2018). CCDC152 (Chen et al., 2015;Do et al., 2017;Zhao et al., 2015) and NT5DC1 genes (Yodklaew et al., 2017) have previously been shown to affect lactation persistency. OSBPL8 gene encodes Oxysterol-binding proteinrelated eight proteins, a group of intravenous lipid receptors. It is found to affect the fatty acid trait in milk . The protein encoded by the DPY19L1 gene has a role in transferring glycosyl groups, and a polymorphism in this gene has been reported to be associated with peak yield in dairy cattle (Yodklaew et al., 2017). ARF-GAP3 gene has been found to affect fat production (De Camargo et al., 2015).

CONCLUSION
Whole-genome sequencing analysis is a useful method for detecting genes that functionally and globally are associated with different phenotypes in domesticated animals. The current study aimed to identify important genomic regions and genes associated with milk production traits in dairy sheep breeds, including Assaf and genes. Based on the biological function of these genes in the candidate selected regions, several genes, including CSN1S1, CSN2 and SLC35A3, have possibly a major role in milk composition and production in sheep breeds. However, the identified novel candidate genes in the selected regions need to be validated in independent studies to provide a more profound insight into the genetic architecture of Ovis aries milk yield and composition.

CONFLICT OF INTEREST
There is no conflict of interest for this research

ETHICAL STATEMENT
We used the already available data and based on our institute policy in this case, no ethics review was required. Based on the policy of the animal ethics committee, Shahid Bahonar University of Kerman, this study did not require ethical approval because the study used the datasets freely available in the public.

PEER REVIEW
The peer review history for this article is available at https://publons. com/publon/10.1002/vms3.731